Overview

Dataset Statistics

Number of Variables 6
Number of Rows 99441
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 30.4 MB
Average Row Size in Memory 320.3 B
Variable Types
  • Numerical: 2
  • Categorical: 4

Dataset Insights

index is uniformly distributed Uniform
customer_id has a high cardinality: 99441 distinct values High Cardinality
customer_unique_id has a high cardinality: 96096 distinct values High Cardinality
customer_city has a high cardinality: 4119 distinct values High Cardinality
customer_id has constant length 32 Constant Length
customer_unique_id has constant length 32 Constant Length
customer_state has constant length 2 Constant Length
customer_id has all distinct values Unique

Variables


index

numerical

Approximate Distinct Count 99441
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1591056
Mean 49720
Minimum 0
Maximum 99440
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 4972
Q1 24860
Median 49720
Q3 74580
95-th Percentile 94468
Maximum 99440
Range 99440
IQR 49720

Descriptive Statistics

Mean 49720
Standard Deviation 28706.2884
Variance 8.2405e+08
Sum 4.9442e+09
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774
  • index is not normally distributed (p-value 7.259388078010123e-05)

customer_id

categorical

Approximate Distinct Count 99441
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 9645777

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 06b8999e2fba1a1fbc...
2nd row 18955e83d337fd6b2d...
3rd row 4e7b3e00288586ebd0...
4th row b2b6027bc5c5109e52...
5th row 4f2d8ab171c80ec836...

Letter

Count 1193579
Lowercase Letter 1193579
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1988533
  • customer_id contains many words: 99441 words
  • customer_id has words of constant length

customer_unique_id

categorical

Approximate Distinct Count 96096
Approximate Unique (%) 96.6%
Missing 0
Missing (%) 0.0%
Memory Size 9645777
  • The largest value (8d50f5eadf50201ccdcedfb9e2ac8455) is over 1.89 times larger than the second largest value (3e43e6105506432c953e165fb2acf44c)

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 861eff4711a542e4b9...
2nd row 290c77bc529b7ac935...
3rd row 060e732b5b29e8181a...
4th row 259dac757896d24d77...
5th row 345ecd01c38d18a903...

Letter

Count 1193030
Lowercase Letter 1193030
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1989082
  • customer_unique_id contains many words: 96096 words
  • The largest value (8d50f5eadf50201ccdcedfb9e2ac8455) is over 1.89 times larger than the second largest value (3e43e6105506432c953e165fb2acf44c)
  • customer_unique_id has words of constant length

customer_zip_code_prefix

numerical

Approximate Distinct Count 14994
Approximate Unique (%) 15.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1591056
Mean 35137.4746
Minimum 1003
Maximum 99990
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • customer_zip_code_prefix is skewed right (γ1 = 0.779)

Quantile Statistics

Minimum 1003
5-th Percentile 3315
Q1 11347
Median 24416
Q3 58900
95-th Percentile 90550
Maximum 99990
Range 98987
IQR 47553

Descriptive Statistics

Mean 35137.4746
Standard Deviation 29797.939
Variance 8.8792e+08
Sum 3.4941e+09
Skewness 0.779
Kurtosis -0.7882
Coefficient of Variation 0.848
  • customer_zip_code_prefix is not normally distributed (p-value 6.972446593890045e-07)

customer_city

categorical

Approximate Distinct Count 4119
Approximate Unique (%) 4.1%
Missing 0
Missing (%) 0.0%
Memory Size 7492329
  • The largest value (sao paulo) is over 2.26 times larger than the second largest value (rio de janeiro)

Length

Mean 10.3445
Standard Deviation 3.9946
Median 9
Minimum 3
Maximum 32

Sample

1st row franca
2nd row sao bernardo do ca...
3rd row sao paulo
4th row mogi das cruzes
5th row campinas

Letter

Count 953332
Lowercase Letter 953332
Space Separator 74872
Uppercase Letter 0
Dash Punctuation 232
Decimal Number 2
  • customer_city contains many words: 3286 words

customer_state

categorical

Approximate Distinct Count 27
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6662547
  • The largest value (SP) is over 3.25 times larger than the second largest value (RJ)

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row SP
2nd row SP
3rd row SP
4th row SP
5th row SP

Letter

Count 198882
Lowercase Letter 0
Space Separator 0
Uppercase Letter 198882
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SP, RJ) take over 50.0%
  • The largest value (sp) is over 3.25 times larger than the second largest value (rj)
  • customer_state has words of constant length

Interactions

Correlations

Missing Values